[LLADA2] Fix llada2 review #13598 by kashif · Pull Request #13698 · huggingface/diffusers

kashif · 2026-05-08T09:25:52Z

What does this PR do?

Fix the issues raised in #13598

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Fixes the six in-scope issues raised in the llada2 model/pipeline review: 1. Carry tokenizer `attention_mask` through `_prepare_input_ids` and add an `attention_mask` arg to `__call__` for pre-tokenized inputs. The runtime mask now reflects prompt padding and zeros out the block-aligned tail past `prompt_length + gen_length` instead of treating those positions as valid context. 2. Thread the per-call `block_length` into `BlockRefinementScheduler.set_timesteps` so the transfer schedule matches the requested block size (previously the scheduler only read its constructor default). 3. Drop `x0`/`x0_p`/`confidence` from `_callback_tensor_inputs` (never bound locals) and bind `sampled_tokens`, `sampled_probs`, `editing_transfer_index`, `active_block` so all advertised callback keys resolve. 4. Allow EOS exactly at index `prompt_length` (the first generated position) to mark a row finished. 5. Freeze rows that have already emitted EOS so subsequent block refinement doesn't extend them, and trim per-row at decode (previously gated on batch_size==1) so post-EOS positions don't leak into decoded text. 6. Stop calling `self.set_progress_bar_config(...)` from inside `__call__`; build a local config dict for the inner block bar so user-supplied flags (in particular `disable=True`) survive the call. Adds regression tests pinning each of the six fixes.

HuggingFaceDocBuilderDev · 2026-05-08T09:36:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dg845

Thanks for the PR! Left one comment.

* [LLaDA2] address review findings from huggingface#13598 Fixes the six in-scope issues raised in the llada2 model/pipeline review: 1. Carry tokenizer `attention_mask` through `_prepare_input_ids` and add an `attention_mask` arg to `__call__` for pre-tokenized inputs. The runtime mask now reflects prompt padding and zeros out the block-aligned tail past `prompt_length + gen_length` instead of treating those positions as valid context. 2. Thread the per-call `block_length` into `BlockRefinementScheduler.set_timesteps` so the transfer schedule matches the requested block size (previously the scheduler only read its constructor default). 3. Drop `x0`/`x0_p`/`confidence` from `_callback_tensor_inputs` (never bound locals) and bind `sampled_tokens`, `sampled_probs`, `editing_transfer_index`, `active_block` so all advertised callback keys resolve. 4. Allow EOS exactly at index `prompt_length` (the first generated position) to mark a row finished. 5. Freeze rows that have already emitted EOS so subsequent block refinement doesn't extend them, and trim per-row at decode (previously gated on batch_size==1) so post-EOS positions don't leak into decoded text. 6. Stop calling `self.set_progress_bar_config(...)` from inside `__call__`; build a local config dict for the inner block bar so user-supplied flags (in particular `disable=True`) survive the call. Adds regression tests pinning each of the six fixes. * fix formatting * undo changes * set block_length to optional and use scheduler's default --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

kashif added 2 commits May 8, 2026 09:12

fix formatting

fa61a3b

github-actions Bot added fixes-issue size/L PR with diff > 200 LOC models tests pipelines schedulers labels May 8, 2026

undo changes

e094299

github-actions Bot removed the models label May 9, 2026

kashif requested a review from dg845 May 10, 2026 15:39

Merge branch 'main' into fix-llada2-review-13598

6f69616

dg845 reviewed May 16, 2026

View reviewed changes

Comment thread src/diffusers/pipelines/llada2/pipeline_llada2.py

dg845 approved these changes May 16, 2026

View reviewed changes

set block_length to optional and use scheduler's default

92376f4

kashif merged commit 79de306 into huggingface:main May 17, 2026
13 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLADA2] Fix llada2 review #13598#13698

[LLADA2] Fix llada2 review #13598#13698
kashif merged 5 commits into
huggingface:mainfrom
kashif:fix-llada2-review-13598

kashif commented May 8, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 8, 2026

Uh oh!

Uh oh!

dg845 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kashif commented May 8, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented May 8, 2026

Uh oh!

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants